Optimizing expression of a single copy transgene in C. elegans

The utility of single copy transgenic insertions in C. elegans is often limited by low expression. We examined the effects of modifying the trans-splicing signal, the Kozak ribosome binding site, the N-terminal amino acid of the reporter and the 3′ UTR sequences on the expression level of a mec-4 promoter GFP transgene. The trans-splicing signal and the 3′ UTR had most dramatic effects on expression while modifying the Kozak signal or the N-terminal amino acid had less influence on expression.

to the trans-splicing sequence, the Kozak ribosomal binding site that promotes translational initiation (Kozak 1987), the N-terminal amino acid controlling N-end rule-mediated protein stability, and the 3′ UTR sequences which regulate message stability on steady state GFP levels in a set of single copy insertions at the same position. Our results document that modifying each of the three components can influence expression levels.
We used an efficient RMCE protocol (Nonet 2020) to create the transgenic animals. Modified versions of a mec-4 promoter GFP-C1 tbb-2 3′ UTR DNA construct were created in an RMCE integration vector using a Golden Gate cloning approach, then integrated using a standard injection protocol. After outcrossing the expression level of GFP at steady state in L4 animal PLM and ALM soma was quantified (Figure 1).
Modification of the sequence upstream of the ATG to contain a computationally determined optimal C. elegans consensus Kozak sequence (Blumenthal and Steward 1997) reduced the steady state level of GFP. However, since the replacement also alters the trans-splice acceptor, we also inserted 3 A bases to add a consensus Kozak site without disrupting the splicing signal. This modification had no influence on expression. Modification of the trans-slice acceptor sequence from TTATAG to the consensus TTTCAG increased steady state levels ~ two fold, consistent with studies that have shown disruption of the trans-splice signal reduces translation efficiency in vivo (Yang et al. 2017). Disrupting the trans-splice signal by mutating to a non-consensus sequence in the -1 to -5 sites (TCCACC) had an opposing effect reducing expression level about two-fold.
Protein expression levels are also regulated by the N-terminal sequence of proteins, through a biological process known as the N-end rule (Gonda et al. 1989). Modification of the first post-Met amino acid of the GFP-C1 protein coding sequence improved expression when the amino acid was changed to valine, the most stabilizing amino acid, and reduced slightly when changed to the unfavorable amino acid glutamine. Although these effects are not dramatic, conformity to the N-end rule is complex depending on additional factors such as inherent structure of the N-terminal region and presence of lysine residues for ubiquitin modification (Varshavsky 2011). Thus, the effects of N -terminal residues may be much more significant for proteins other than GFP.
In addition, it is well documented that expression levels in C. elegans are often strongly influenced by 3′ UTR sequences especially in germline tissue (Merritt et al. 2008). We replaced the ttb-2 3′ UTR with multiple widely used 3′ UTRs as well as the native mec-4 3′ UTR and the neuronal unc-10 3′ UTR. The effect on GFP levels ranged over 10-fold. let-858 was the most and unc-10 the least efficacious 3' sequence. Note that we used a short unc-54 3′ UTR rather than the traditional longer sequence that contains the aex-5 promoter and often yields posterior intestinal background expression (Silva-García et al. 2019). These experiments highlight the robust influence 3′ UTRs have on expression levels. Which 3′ UTRs are most favorable is likely to be cell-type specific, so the same UTRs may not be the most robust in other cell types.
To assess if the effects are additive we created a transgene that incorporated the most effective trans-splicing and protein stability signals and the most optimal 3' UTR. Disappointingly the multi-mutant construct expressed less strongly than the individual modified promoters, indicating that the elements interact with each other in complex ways to determine the overall expression level. Since this result was unexpected, we quantified the expression levels of 4 independently isolated identical insertions of the multi-mutant construct, and all behaved very similarly. This supports our prior experience that the jsTi1453 landing site is not significantly influenced by epigenetic factors under standard laboratory conditions. Finally, we manipulated the mec-4 promoter in a fashion that is unlikely to be easily performed for most promoters. Extensive analysis of the mec-3, mec-4 and mec-7 promoters as well as other mec-3/unc-86 regulated genes (Xue et al. 1992;Duggan et al. 1998;Zhang et al. 2002) has defined a consensus binding site for the critical UNC-86/MEC-3 transcription factor heterodimer [CATN (3-4) AAATGCAT]. The mec-4 promoter is known to contain one such sequence; CATtatAAATGTAT. We inserted an additional binding site 100 bp upstream of the known binding site by introducing 27 bp that contain CATaagAAATGTAT -an identical sequence to the native binding site in the mec-4 promoter at the critical bases (capitals). Introduction of this sequence increased expression over two-fold compared to the native promoter and was the most potent of the manipulations performed.
While our studies identify some modifications that can be introduced into transgenic constructs to increase expression, they do not define a clear set of rules that can be implemented to insure high expression. Nevertheless, the simplicity of RMCE integration should make altering transgenic constructs a more realistic option to attempt before resorting to creating multicopy integrated transgenes.

Methods
C. elegans was maintained on NGM agar plates spotted with OP50 at 22.5°C or at 25°C during the RMCE protocol.

Microscopy
For quantification of GFP signals, homozygous L4 hermaphrodite animals were mounted on 2% agar pads in a 2 µl drop of 1mM levamisole in phosphate buffered saline and imaged on an Olympus (Center Valley, PA) BX-60 microscope equipped with a Qimaging (Surrey, BC Canada) Retiga EXi monochrome CCD camera, a Lumencor AURA LED light source, Semrock (Rochester, NY) GFP-3035B and mCherry-A-000 filter sets, and a Tofra (Palo Alto, CA) focus drive, run using micro-manager 2.0ß software (Edelstein et al. 2014) using a 40X air lens at 20% LED power with 200 ms exposures. PLM soma and ALM soma signals were quantified using the FIJI version of ImageJ software (Schindelin et al. 2012) as described in Nonet (2020).

Plasmid constructions
Modified versions of the NM3732 pLF3FShC mec-4p GFP-C1 tbb-2 plasmid were performed by SapI Golden Gate (GG) assembly inserting modified components from DR274 insert constructs as outlined below. DR274 entry vectors were created by inserting PCR fragments into the vectors using a BsaI GG reaction. Assembly reactions were performed as described in Nonet (2020).

Reagents
Plasmids and worm strains are available by request from MLN and will be submitted to Addgene and the Caenorhabditis Genetics Center if demand levels warrant it.